1,333 research outputs found

    Automatic construction of known-item finding test beds

    Get PDF
    This work is an initial study on the utility of automatically generated queries for evaluating known-item retrieval and how such queries compare to real queries. The main advantage of automatically generating queries is that for any given test collection numerous queries can be produced at minimal cost. For evaluation, this has huge ramifications as state-of-the-art algorithms can be tested on different types of generated queries which mimic particular querying styles that a user may adopt. Our approach draws upon previous research in IR which has probabilistically generated simulated queries for other purposes [2, 3]

    Probabilistic hyperspace analogue to language

    Get PDF
    Song and Bruza introduce a framework for Information Retrieval(IR) based on Gardenfor's three tiered cognitive model; Conceptual Spaces. They instantiate a conceptual space using Hyperspace Analogue to Language (HAL to generate higher order concepts which are later used for ad-hoc retrieval. In this poster, we propose an alternative implementation of the conceptual space by using a probabilistic HAL space (pHAL). To evaluate whether converting to such an implementation is beneficial we have performed an initial investigation comparing the concept combination of HAL against pHAL for the task of query expansion. Our experiments indicate that pHAL outperforms the original HAL method and that better query term selection methods can improve performance on both HAL and pHAL

    Updating collection representations for federated search

    Get PDF
    To facilitate the search for relevant information across a set of online distributed collections, a federated information retrieval system typically represents each collection, centrally, by a set of vocabularies or sampled documents. Accurate retrieval is therefore related to how precise each representation reflects the underlying content stored in that collection. As collections evolve over time, collection representations should also be updated to reflect any change, however, a current solution has not yet been proposed. In this study we examine both the implications of out-of-date representation sets on retrieval accuracy, as well as proposing three different policies for managing necessary updates. Each policy is evaluated on a testbed of forty-four dynamic collections over an eight-week period. Our findings show that out-of-date representations significantly degrade performance overtime, however, adopting a suitable update policy can minimise this problem

    Towards better measures: evaluation of estimated resource description quality for distributed IR

    Get PDF
    An open problem for Distributed Information Retrieval systems (DIR) is how to represent large document repositories, also known as resources, both accurately and efficiently. Obtaining resource description estimates is an important phase in DIR, especially in non-cooperative environments. Measuring the quality of an estimated resource description is a contentious issue as current measures do not provide an adequate indication of quality. In this paper, we provide an overview of these currently applied measures of resource description quality, before proposing the Kullback-Leibler (KL) divergence as an alternative. Through experimentation we illustrate the shortcomings of these past measures, whilst providing evidence that KL is a more appropriate measure of quality. When applying KL to compare different QBS algorithms, our experiments provide strong evidence in favour of a previously unsupported hypothesis originally posited in the initial Query-Based Sampling work

    Adaptive query-based sampling for distributed IR

    Get PDF
    No abstract available

    FPGA-accelerated information retrieval: high-efficiency document filtering

    Get PDF
    Power consumption in data centres is a growing issue as the cost of the power for computation and cooling has become dominant. An emerging challenge is the development of ldquoenvironmentally friendlyrdquo systems. In this paper we present a novel application of FPGAs for the acceleration of information retrieval algorithms, specifically, filtering streams/collections of documents against topic profiles. Our results show that FPGA acceleration can result in speed-ups of up to a factor 20 for large profiles

    A retrieval evaluation methodology for incomplete relevance assessments

    Get PDF
    In this paper we a propose an extended methodology for laboratory based Information Retrieval evaluation under in complete relevance assessments. This new protocol aims to identify potential uncertainty during system comparison that may result from incompleteness. We demonstrate how this methodology can lead towards a finer grained analysis of systems. This is advantageous, because the detection of uncertainty during the evaluation process can guide and direct researchers when evaluating new systems over existing and future test collections

    An evaluation of resource description quality measures

    Get PDF
    An open problem for Distributed Information Retrieval is how to represent large document repositories (known as resources) efficiently. To facilitate resource selection, estimated descriptions of each resource are required, especially when faced with non-cooperative distributed environments. Accurate and efficient Resource description estimation is required as this can have an affect on resource selection, and as a consequence retrieval quality. Query-Based Sampling (QBS) has been proposed as a novel solution for resource estimation, with proceeding techniques developed therafter. However, the challenge to determine if one QBS technique is better at generating resource description than another is still an unresolved issue. The initial metrics tested and deployed for measuring resource description quality were the Collection Term Frequency ratio (CTF) and Spearman Rank Correlation Coefficient (SRCC). The former provides an indication of the percentage of terms seen, whilst the later measures the term ranking order, although neither consider the term frequency, which is important for resource selection. We re-examine this problem and consider measuring the quality of a resource description in context to resource selection, where an estimate of the probability of a term given the resource is typically required. We believe a natural measure for comparing the estimated resource against the actual resource is the Kullback-Leibler Divergence (KL) measure. KL addresses the concerns put forward previously, by not over-representing low frequency terms, and also considering term order. In this paper, we re-assess the two previous measures alongside KL. Our preliminary investigation revealed that the former metrics display contradictory results. Whilst, KL suggested a different QBS technique than that prescribed in, would provide better estimates. This is a significant result, because it now remains unclear as to which technique will consistently provide better resource descriptions. The remainder of this paper details the three measures, the experimental analysis of our preliminary study and outlines our points of concern along with further research directions

    Contextual information and assessor characteristics in complex question answering

    Get PDF
    The ciqa track investigates the role of interaction in answering complex questions: questions that relate two or more entities by some specified relationship. In our submission to the first ciqa track we were interested in the interplay between groups of variables: variables describing the question creators, the questions asked and the presentation of answers to the questions. We used two interaction forms - html questionnaires completed before answer assessment - to gain contextual information from the answer assessors to better understand what factors influence assessors when judging retrieved answers to complex questions. Our results indicate the importance of understanding the assessor's personal relationship to the question - their existing topical knowledge for example - and also the presentation of the answers - contextual information about the answer to aid in the assessment of the answer

    Topic based language models for ad hoc information retrieval

    Get PDF
    We propose a topic based approach lo language modelling for ad-hoc Information Retrieval (IR). Many smoothed estimators used for the multinomial query model in IR rely upon the estimated background collection probabilities. In this paper, we propose a topic based language modelling approach, that uses a more informative prior based on the topical content of a document. In our experiments, the proposed model provides comparable IR performance to the standard models, but when combined in a two stage language model, it outperforms all other estimated models
    corecore